A Simd Sparse Matrix-vector Multiplication Algorithm for Computational Electromagnetics and Scattering Matrix Models
نویسنده
چکیده
Kipadia, Nirav Harish. M.S.E.E., Purdue University. May 1994. Pi SIMD Sparse Matrix-Vector Multiplication Algorithm for Computational Electromagnetics and Scattering Matrix Models. Major Professor: Jose Fortes. A large number of problems in numerical analysis require the multiplication of a sparse matrix by a vector. In spite of the large amount of fine-grained parallelism available in the process of sparse matrix-vector multiplication, it is difficult to design an algorithm for distributed memory SIMD computers that can efficiently multiply an arbitrary sparse matrix by a vector. The difficulty lies in the irregular nature of the data structures required to efficiently store arbitrary sparse matrices, and the architectural constraints of a SIMD computer. We propose a new algorithm that allows the "regularity" of a data structure that uses a row-major mapping to be varied by a changing a parameter (the block size^'). The (block row) algorithm assumes that the number of non-zero elements in each row is a multiple of the blocksize; (additional) zero entries are stored to satisfy this condition. The blocksize can be varied from one to N, where N is the size of the matrix; a blocksize of one results in a rcw-major distribution of the non-zero elements of the matrix (no oveahead of storing zcxo elements), while a blocksize of N results in a row-major distribution corresponding to that of a dense matrix. The algorithm was irnplemerlted on a 16,384 processor MasPar MP-1, and for the matrices associated with ithe applications considered here (S-Matrix Approach to Device Simulation, and tlhe Modeling of Diffractive and Scattering Objects), the algorithm was faster than ainy of the other algorithms considered (the "snake-like" method, the "segmented-scan" method, and a randomized packing algorithm). For matrices that have a wide variation in the number of non-zero elements in each row, a procedure for an "adaptive" block row allgorithrn is briefly mentioned. The block row algorithm is applicable to unstructured sllarse matrices which have relatively sparse columns (dense rows arc: not a problem), and it can be implemented on any distributed memory computer.
منابع مشابه
Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in many scientific and engineering applications. This paper proposes a hybrid optimization method to break the performance bottleneck of SpMV on SIMD processors. The method includes a new sparse ma...
متن کاملA Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units | SIAM Journal on Scientific Computing | Vol. 36, No. 5 | Society for Industrial and Applied Mathematics
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instru...
متن کاملRun-Time Optimization of Sparse Matrix-Vector Multiplication on SIMD Machines
Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientific computations (e.g., finite element methods). In such solvers, the matrix-vector product is computed repeatedly, often thousands of times, with updated values of the vector until convergence is achieved. In an SIMD architecture, each processor has to fetch the updated off-processor vector el...
متن کاملRun - Time Optimization of Sparse Matrix - Vector Multiplication onSIMD
Sparse matrix-vector multiplication forms the heart of iterative linear solvers used widely in scientiic computations (e.g., nite element methods). In such solvers, the matrix-vector product is computed repeatedly, often thousands of times, with updated values of the vector until convergence is achieved. In an SIMD architecture, each processor has to fetch the updated oo-processor vector elemen...
متن کاملEfficient Multicore Sparse Matrix-Vector Multiplication for Finite Element Electromagnetics on the Cell-BE processor
Multicore systems are rapidly becoming a dominant industry trend for accelerating electromagnetics computations, driving researchers to address parallel programming paradigms early in application development. We present a new sparse representation and a two level partitioning scheme for efficient sparse matrix-vector multiplication on multicore systems, and show results for a set of finite elem...
متن کامل